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Abstract 

We study the computational complexity 
of the parsing problem of a variant of 
Lambek Categorial Grammar that we call 
semidirectional. In semidirectional Lambek 
calculus SDL there is an additional non- 
directional abstraction rule allowing the 
formula abstracted over to appear any- 
where in the premise sequent 's left-hand 
side, thus permitting non-peripheral ex- 
traction. SDL grammars are able to gen- 
erate each context-free language and more 
than that. We show that the parsing prob- 
lem for semidirectional Lambek Grammar 
is NP-complete by a reduction of the 3- 
Partition problem. 
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1 Introduction 

Categorial Grammar (CG) and in particular Lambek 
Categorial Grammar (LCG) have their well-known 
benefits for the formal treatment of natural language 
syntax and semantics. The most outstanding of these 
benefits is probably the fact that the specific way, 
how the complete grammar is encoded, namely in 
terms of 'combinatory potentials' of its words, gives 
us at the same time recipes for the construction of 
meanings, once the words have been combined with 
others to form larger linguistic entities. Although 
both frameworks are equivalent in weak generative 
capacity — both derive exactly the context-free lan- 
guages — , LCG is superior to CG in that it can cope 
in a natural way with extraction and unbounded de- 
pendency phenomena. For instance, no special cate- 
gory assignments need to be stipulated to handle a 



relative clause containing a trace, because it is an- 
alyzed, via hypothetical reasoning, like a traceless 
clause with the trace being the hypothesis to be dis- 
charged when combined with the relative pronoun. 
Figure |l| illustrates this proof- logical behaviour. No- 
tice that this natural-deduction-style proof in the 
type logic corresponds very closely to the phrase- 
structure tree one would like to adopt in an analysis 
with traces. We thus can derive Bill misses e as 
an s from the hypothesis that there is a "phantom" 
np in the place of the trace. Discharging the hypoth- 
esis, indicated by index 1, results in Bill misses 
being analyzed as an s/np from zero hypotheses. Ob- 
serve, however, that such a bottom-up synthesis of a 
new unsaturated type is only required, if that type 
is to be consumed (as the antecedent of an impli- 
cation) by another type. Otherwise there would be 
a simpler proof without this abstraction. In our ex- 
ample the relative pronoun has such a complex type 
triggering an extraction. 

A drawback of the pure Lambek Calculus L is that it 
only allows for so-called 'peripheral extraction', i.e., 
in our example the trace should better be initial or 
final in the relative clause. 

This inflexibility of Lambek Calculus is one of the 
reasons why many researchers study richer systems 
today. For instance, the recent work by Moortgat 



(Moortgat 94) gives a systematic in-depth study of 



mixed Lambek systems, which integrate the systems 
L, NL, NLP, and LP. These ingredient systems are 
obtained by varying the Lambek calculus along two 
dimensions: adding the permutation rule (P) and/or 
dropping the assumption that the type combinator 
(which forms the sequences the systems talk about) 
is associative (N for non-associative). 

Taken for themselves these variants of L are of lit- 
tle use in linguistic descriptions. But in Moortgat's 
mixed system all the different resource management 
modes of the different systems are left intact in the 
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Figure 1: Extraction as resource-conscious hypothetical reasoning 



combination and can be exploited in different parts 
of the grammar. The relative pronoun which would, 
for instance, receive category (np\np) / (np — o s) 
with — o being implication in LP,[J i.e., it requires 
as an argument "an s lacking an np somewhere" J^. 

The present paper studies the computational com- 
plexity of a variant of the Lambek Calculus that lies 
between L and LP, the Semidirectional Lambek Cal- 
culus SDL. | Since LP derivability is known to be NP- 
complete, it is interesting to study restrictions on the 
use of the LP operator — o. A restriction that leaves 
its proposed linguistic applications intact is to admit 
a type B — o A only as the argument type in func- 
tional applications, but never as the functor. Stated 
prove-theoretically for Gentzen-style systems, this 
amounts to disallowing the left rule for — o. Surpris- 
ingly, the resulting system SDL can be stated with- 
out the need for structural rules, i.e., as a monolithic 
system with just one structural connective, because 
the ability of the abstracted-over formula to permute 
can be directly encoded in the right rule for — o. ^ 

Note that our purpose for studying SDL is not that 



1 The Lambek calculus with permutation LP is also 
called \ he. "nondirectional Lambek calculus" (Ben 



them 8£ ). In it the leftward and rightward implication 
collapse. 



2 Morrill (Morrill 94) achieves the same effect with a 
permutation modality A applied to the np gap: (s/Anp) 
3 This name was coined by Esther Konig-Baumer, who 
employs a variant of this calculus in her LexGram system 



(Konig 95) for practical grammar development. 

11 It should be pointed out that the resource manage- 
ment in this calculus is very closely related to the han- 
dling and interaction of local valency and unbounded 
dependencies in HPSG. The latter being handled with 
set- valued features SLASH, QUE and REL essentially emu- 
lates the permutation potential of abstracted categories 
in semidirectional Lambek Grammar. A more detailed 
an alysis of th e relation between HPSG and SDL is given 
in ( [Konig 95| ). 



it might be in any sense better suited for a theory of 
grammar (except perhaps, because of its simplicity), 
but rather, because it exhibits a core of logical be- 
haviour that any richer system also needs to include, 
at least if it should allow for non-peripheral extrac- 
tion. The sources of complexity uncovered here are 
thus a forteriori present in all these richer systems 
as well. 

2 Semidirectional Lambek Grammar 
2.1 Lambek calculus 

The semidirectional Lambek calculus (henceforth 
SDL) is a variant of J. Lambek's original (Lam- 



|bek 58 ) calculus of syntactic types. We start by 



defining the Lambek calculus and extend it to ob- 
tain SDL. 

Formulae (also called "syntactic types") are built 
from a set of prepositional variables (or "primitive 
types") B — {bi,b 2 , ■ ■ ■} and the three binary con- 
nectives • , \, /, called product, left implication, and 
right implication. We use generally capital letters A, 
B, C, . . . to denote formulae and capitals towards the 
end of the alphabet T, U, V, ... to denote sequences 
of formulae. The concatenation of sequences U and 
V is denoted by (U,V). 

The (usual) formal framework of these logics is a 
Gentzen-style sequent calculus. Sequents are pairs 
(U, A), written as U =>- A, where A is a type and U 
is a sequence of types.Q The claim embodied by se- 
quent U A can be read as "formula A is derivable 



5 In contrast to Linear Logic (GJirard 87) the order 
of types in U is essential, since the structural rule of 
permutation is not assumed to hold. Moreover, the fact 
that only a single formula may appear on the right of =>, 
make the Lambek calculus an intuitionistic fragment of 
the multiplicative fragment of non-commutative prepo- 
sitional Linear Logic. 



from the structured database TP. Figure || shows 
Lambek's original calculus L. 

First of all, since we don't need products to obtain 
our results and since they only complicate matters, 
we eliminate products from consideration in the se- 
quel. 

In Semidirectional Lambek Calculus we add as ad- 
ditional connective the LP implication — o, but equip 
it only with a right rule. 

t'H'bZa ^ ^ if T = y ) nonem pty- 

Let us define the polarity of a subformula of a se- 
quent Ax,..., A n =>■ A as follows: A has positive po- 
larity, each of A4 have negative polarity and if B/C 
or C\B has polarity p, then B also has polarity p 
and C has the opposite polarity of p in the sequent. 

A consequence of only allowing the (— o R) rule, 
which is easily proved by induction, is that in any 
derivable sequent — o may only appear in positive 
polarity. Hence, — o may not occur in the (cut) for- 
mula A of a (Cut) application and any subformula 
B — o A which occurs somewhere in the prove must 
also occur in the final sequent. When we assume the 
final sequent's RHS to be primitive (or — o-less), then 
the (—0 R) rule will be used exactly once for each 
(positively) occuring -o-subformula. In other words, 
(— o R) may only do what it is supposed to do: ex- 
traction, and we can directly read off the category 
assignment which extractions there will be. 

We can show Cut Elimination for this calculus by a 
straight-forward adaptation of the Cut elimination 
proof for L. We omit the proof for reasons of space. 

Proposition 1 (Cut Elimination) Each 
SDL-derivable sequent has a cut-free proof. 

The cut-free system enjoys, as usual for Lambck-likc 
logics, the Subformula Property: in any proof only 
subformulae of the goal sequent may appear. 

In our considerations below we will make heavy use 
of the well-known count invariant for Lambek sys- 
tems ( Bcnthcm 88 ), which is an expression of the 
resource-consciousness of these logics. Define #&(A) 
(the b- count of A), a function counting positive and 
negative occurrences of primitive type b in an arbi- 
trary type A, to be 

'1 if A = b 

if A primitive and A ^ b 

*b(A) = I M B ) - *b(C) if A = B/C or A = C\B 

or A = C -o B 
# 6 (B) + # 6 (C) HA = B.C 



The invariant now states that for any primitive b, 
the &-count of the RHS and the LHS of any derivable 
sequent are the same. By noticing that this invariant 
is true for (Ax) and is preserved by the rules, we 
immediately can state: 



Proposition 2 (Count Invariant) // 1 

A, then # b (U) = # b (A) for any beB. 



SDL 



U 



Let us in parallel to SDL consider the fragment of it 
in which (/ R) and (\R) are disallowed. We call this 
fragment SDL~. Remarkable about this fragment is 
that any positive occurrence of an implication must 
be — o and any negative one must be / or \. 

2.2 Lambek Grammar 

Definition 3 We define a Lambek grammar to be a 
quadruple (E,J-,bs,l) consisting of the finite alpha- 
bet of terminals E, the set T of all Lambek formulae 
generated from some set of propositional variables 
which includes the distinguished variable s, and the 
lexical map I : E — > 2^ which maps each terminal to 
a finite subset of T . 

We extend the lexical map I to nonempty strings 
of terminals by setting l(w\Wi . . .w n ) :— l(wx) x 
1(11)2) x ... x l(w n ) for wxW2 ■ ■ ■ w n G E + . 

The language generated by a Lambek grammar G = 
(EjJ 7 , bs,l) is defined as the set of all strings 
wxW2 ■ ■ ■ w n G E + for which there exists a sequence 
of types U G l(w\W2 ■ ■ - w n ) and h|_ U => 65. We 
denote this language by L(G). 

An SDL-grammar is defined exactly like a Lambek 
grammar, except that l~SDL replaces h|_. 

Given a grammar G and a string w = Wxu>2 ■ ■ ■ w n , 
the parsing (or recognition) problem asks the ques- 
tion, whether w is in L(G). 

It is not immediately obvious, how the generative 
capacity of SDL-grammars relate to Lambek gram- 
mars or nondirectional Lambek grammars (based 
on calculus LP). Whereas Lambek grammars gener- 
ate exactly the context -free langu ages (modulo the 
missing empty word) (Pcntus 93), the latter gen- 
erate all permutation closures of context-free lan- 
guages (Benthem 88). This excludes many context- 



free or even regular languages, but includes some 
context-sensitive ones, e.g., the permutation closure 
of a n b n c n . 

Concerning SDL, it is straightforward to show that 
all context-free languages can be generated by SDL- 
grammars. 



T =>B U,A,V ^>C 
U,A/B,T,V =>C 

T =» B U,A,V ^ C 
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(\L) 
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U=>A/B 

B,U^A 
HWB\A~ 

U ^A V => B 



U,V ^ AmB 



(/R) if U nonempty 
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(•R) 



T^A U,A,V^C_ {Cut) 



U,T,V^>C 



Figure 2: Lambck calculus L 
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Figure 3: Proof of A?, A 2 , B™, B 2 , C 2 



Proposition 4 Every context-free language is gen- 
erated by some SDL-grammar. 



Proof. We can use a the standard transformation 
of an arbitrary cfr. grammar G — (N, T, P, S) to a 
categorial grammar G . Since — o does not appear 
in G' each SDL-proof of a lexical assignment must 
be also an L-proof, i.e. exactly the same strings are 
judged grammatical by SDL as are judged by L. □ 

Note that since the {(Ax),(/L),(\L)} subset of L 
already accounts for the cfr. languages, this obser- 
vation extends to SDL - . 

Moreover, some languages which are not context-free 
can also be generated. 

Example. Consider the following grammar G for 
the language a n b n c n . We use primitive types B = 
{b,c,x,y,z} and define the lexical map for X = 



{a, b, c} as follows: 

1(a) := { x/(c — o (b — o x)), x/(c^(b^y))} 
= A 1 =A 2 

1(b) := { (y/b)/y, (y/b)/z } 
— B\ — B 2 

1(c) := { (z/c)/z, z/c }. 

= C\ = C 2 

The distinguished primitive type is x. To simplify 
the argumentation, we abbreviate types as indicated 
above. 

Now, observe that a sequent U => x, where U is the 
image of some string over E, only then may have bal- 
anced primitive counts, if U contains exactly one oc- 
currence of each of A 2 , B 2 and C 2 (accounting for the 
one supernumerary x and balanced y and z counts) 
and for some number n > 0, n occurrences of each 
of A\,B\, and C\ (because, resource-oriented speak- 
ing, each Bi and Cj "consume" a b and c, resp., and 
each Ai "provides" a pair b, c). Hence, only strings 



containing the same number of a's, b's and c's may 
be produced. Furthermore, due to the Subformula 
Property we know that in a cut-free proof of U =>■ x, 
the main formula in abstractions (right rules) may 
only be either c— o (b ^> X) or b —o X, where 
X G {x,y}, since all other implication types have 
primitive antecedents. Hence, the LHS of any se- 
quent in the proof must be a subsequence of U, with 
some additional b types and c types interspersed. 
But then it is easy to show that U can only be of 
the form 



A™, A 2 , £?", B 2 , C™, C2, 



since any / connective in U needs to be introduced 
via (/£). 

It remains to be shown, that there is actually a proof 
for such a sequent. It is given in Figure 0. 

The sequent marked with * is easily seen to be deriv- 
able without abstractions. 

A remarkable point about SDL's ability to cover this 
language is that neither L nor LP can generate it. 
Hence, this example substantiates the claim made in 



( Moortgat 94 ) that the inferential capacity of mixed 
Lambek systems may be greater than the sum of 
its component parts. Moreover, the attentive reader 
will have noticed that our encoding also extends to 
languages having more groups of n symbols, i.e., to 
languages of the form a r {a 2 . . . aJJ. 

Finally, we note in passing that for this grammar the 
rules (/R) and (\R) are irrelevant, i.e. that it is at 
the same time an SDL~ grammar. 



3 NP-Completeness of the Parsing 
Problem 



Instance: Set A of 3m elements, a bound N G 
Z + , and a size s(a) G Z + for each 
a £ A such that ^ < s(a) < and 

J2aeA S ( a ) = mN - 

Question: Can A be partitioned into m disjoint 
sets Ai,A2,--- 1 A m such that, for 
1 < i < m, J2aeA- s ( a ) = N ( n °te 
that each A4 must therefore contain 
exactly 3 elements from A)l 

Comment: NP-complete in the strong sense. 

Here is our reduction. Let T = (A, m, N, s) be 
a given 3-Partition instance. For notational conve- 
nience we abbreviate (. . . ({A/ Bi) / B 2 ) / ■ ■ ■)/ B n by 
A/B n • , . .• B 2 *Bx and similarly B n — o (. . . {B\ — o 
A) . . .) by B n • . . . • B 2 • B 1 — o A, but note that this 
is just an abbreviation in the product-free fragment. 
Moreover the notation A k stands for 

A* A» . . .• A 

S v ' 

k times 

We then define the SDL-grammar Gr = (£, T, 65, 1) 
as follows: 

S := {v,wi, . . . ,W 3 m} 

T := all formulae over primitive types 

B = {a,d}U\J™ 1 {b i ,c i } 
b s := a 



l(v) :=a/{bf 
for 1 < i < 3rn — 1 : 

K w i) ■= Ui<j< m d / d * b 

l(w 3m ) ■= \Jl<j< m d / b j 



• cy • cv • 



■3 * C J 



^s(a 3m ) 



The word we are interested in is VW1W2 ■ ■ ■ w 3 m . 
We do not care about other words that might be 
generated by Gr- Our claim now is that a given 
3-Partition problem T is solvable if and only if 
vwi . . . w 3m is in L(Gr). We consider each direction 
in turn. 



We show that the Parsing Problem for SDL- 
grammars is NP-complete by a reduction of the 
3-Partition Problem to it.[] This well-known NP- 
complcte problem is cited in ( parey Johnson 79 ) as 
follows. 



A similar reduction has been used in (Lincoln Win 



kler 94) to show that derivability in the multiplicative 



fragment of prepositional Linear Logic with only the con- 
nectives — o and ® (equivalently Lambek calculus with 
permutation LP) is NP-complete. 



Lemma 5 (Soundness) If a 3-Partition problem 
T = (^4, m, N, s) has a solution, then v W\ . . . W3 m is 
in L(G r ). 

Proof. We have to show, when given a solution to T, 
how to choose a type sequence U G l(vui\ . . -W3 m ) 
and construct an SDL proof for U =>■ a. Suppose 
A = {a\, CL2, . . . , <X3 m }. From a given solution (set 
of triples) A%, Ai, ■ ■ ■ , A m we can compute in poly- 
nomial time a mapping k that sends the index of 
an element to the index of its solution triple, i.e., 
k(i) = j iff at G Aj. To obtain the required sequence 
U, we simply choose for the Wi terminals the type 



d/d • b Hi) • c'ffi (resp. d/b k{3m) • for w 3m ). 

Hence the complete sequent to solve is: 



s(o 3m ) 



a/{b\ . b\ 
d/d • b k{l) • < 



1(01) 

fc(l) 



• C 



AT 



C 2 



AT 



• t£ -O d) 



(*) d/d. b^^. 4^:1] 



(o 3m ) 



ill s\a 3m 

d/b H3m) • c fc(3m) 



encoding. Moreover, the parsing problem also lies 
within NP, since for a given grammar G proofs are 
linearly bound by the length of the string and hence, 
we can simply guess a proof and check it in polyno- 
mial time. Therefore we can state the following: 

Theorem 7 The parsing problem for SDL is NP- 
complete. 

Finally, we observe that for this reduction the rules 
(/R) and (\R) are again irrelevant and that we can 
extend this result to SDL - . 



Let a/B , P>i, . . . B 3m 4 a be a shorthand for (*), 
and let X stand for the sequence of primitive types 

t s(a3 m ) i s(a3 m _i) 7 s(ai) 

0fc(3m),C fc(3m) ,0 fe (3 TO _ 1 ),C fe(3TO _ 1) ,...O fe ( 1 ),C fe(1) • 

Using rule {/ L) only, we can obviously prove 
Bi, . . . B 3m , X => d. Now, applying (— o R) 3m+Nm 
times we can obtain B\, . . . B 3m =4> Bq, since there 
are in total, for each i, 3 bi and N Ci in X. As final 
step we have 



B\ , . . . B 3m =4- Bo a =g> a 
a/ Bo, Bi, . . . B 3m =5 a 

which completes the proof. 



(/£) 



□ 



Lemma 6 (Completeness) Let T = (A, m, N, s) 

be an arbitrary 3-Partition problem and Gr the cor- 
responding SDL-grammar as defined above. Then T 
has a solution, if vw\ . . .w 3m is in L(Gr)- 

Proof. Let wioi... w 3m G L(Gp) and 

a/(6?.....6^.cf •...•<£ -od),Si,...S 3OT 

be a witnessing derivable sequent, i.e., for 1 < i < 
3m, Bi G l(wi). Now, since the counts of this se- 
quent must be balanced, the sequence B\ , . . . £?3 TO 
must contain for each 1 < j < m exactly 3 bj and 
exactly N Cj as sub formulae. Therefore we can read 
off the solution to T from this sequent by including 
in Aj (for 1 < j < m) those three for which Bi 
has an occurrence of bj, say these are o,j(i), o-j(2) and 
a j(3) - ^ e ver hy> again via balancedness of the prim- 
itive counts, that s(cij(i)) + s(aj( 2 )) + s(dj^) = N 
holds, because these are the numbers of positive and 
negative occurrences of Cj in the sequent. This com- 
pletes the proof. □ 

The reduction above proves NP-hardness of the pars- 
ing problem. We need strong NP-completeness of 
3-Partition here, since our reduction uses a unary 



4 Conclusion 

We have defined a variant of Lambek's original cal- 
culus of types that allows abstracted-over categories 
to freely permute. Grammars based on SDL can 
generate any context-free language and more than 
that. The parsing problem for SDL, however, we 
have shown to be NP-complete. This result indi- 
cates that efficient parsing for grammars that al- 
low for large numbers of unbounded dependencies 
from within one node may be problematic, even in 
the categorial framework. Note that the fact, that 
this problematic case doesn't show up in the correct 
analysis of normal NL sentences, doesn't mean that 
a parser wouldn't have to try it, unless some arbi- 
trary bound to that number is assumed. For practi- 
cal grammar engineering one can devise the motto 
avoid accumulation of unbounded dependencies by 
whatever means. 

On the theoretical side we think that this result for 
SDL is also of some importance, since SDL exhibits 
a core of logical behaviour that any (Lambek-based) 
logic must have which accounts for non-peripheral 
extraction by some form of permutation. And hence, 
this result increases our understanding of the nec- 
essary computational properties of such richer sys- 
tems. To our knowledge the question, whether the 
Lambek calculus itself or its associated parsing prob- 
lem are NP-hard, are still open. 
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